36 research outputs found
A Kernel Perspective for Regularizing Deep Neural Networks
We propose a new point of view for regularizing deep neural networks by using
the norm of a reproducing kernel Hilbert space (RKHS). Even though this norm
cannot be computed, it admits upper and lower approximations leading to various
practical strategies. Specifically, this perspective (i) provides a common
umbrella for many existing regularization principles, including spectral norm
and gradient penalties, or adversarial training, (ii) leads to new effective
regularization penalties, and (iii) suggests hybrid strategies combining lower
and upper bounds to get better approximations of the RKHS norm. We
experimentally show this approach to be effective when learning on small
datasets, or to obtain adversarially robust models.Comment: ICM
Convolutional Kernel Networks for Graph-Structured Data
We introduce a family of multilayer graph kernels and establish new links
between graph convolutional neural networks and kernel methods. Our approach
generalizes convolutional kernel networks to graph-structured data, by
representing graphs as a sequence of kernel feature maps, where each node
carries information about local graph substructures. On the one hand, the
kernel point of view offers an unsupervised, expressive, and easy-to-regularize
data representation, which is useful when limited samples are available. On the
other hand, our model can also be trained end-to-end on large-scale data,
leading to new types of graph convolutional neural networks. We show that our
method achieves competitive performance on several graph classification
benchmarks, while offering simple model interpretation. Our code is freely
available at https://github.com/claying/GCKN
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention
We address the problem of learning on sets of features, motivated by the need
of performing pooling operations in long biological sequences of varying sizes,
with long-range dependencies, and possibly few labeled data. To address this
challenging task, we introduce a parametrized representation of fixed size,
which embeds and then aggregates elements from a given input set according to
the optimal transport plan between the set and a trainable reference. Our
approach scales to large datasets and allows end-to-end training of the
reference, while also providing a simple unsupervised learning mechanism with
small computational cost. Our aggregation technique admits two useful
interpretations: it may be seen as a mechanism related to attention layers in
neural networks, or it may be seen as a scalable surrogate of a classical
optimal transport-based kernel. We experimentally demonstrate the effectiveness
of our approach on biological sequences, achieving state-of-the-art results for
protein fold recognition and detection of chromatin profiles tasks, and, as a
proof of concept, we show promising results for processing natural language
sequences. We provide an open-source implementation of our embedding that can
be used alone or as a module in larger learning models at
https://github.com/claying/OTK.Comment: ICLR 202
GraphiT: Encoding Graph Structure in Transformers
We show that viewing graphs as sets of node features and incorporating
structural and positional information into a transformer architecture is able
to outperform representations learned with classical graph neural networks
(GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative
positional encoding strategies in self-attention scores based on positive
definite kernels on graphs, and (ii) enumerating and encoding local
sub-structures such as paths of short length. We thoroughly evaluate these two
ideas on many classification and regression tasks, demonstrating the
effectiveness of each of them independently, as well as their combination. In
addition to performing well on standard benchmarks, our model also admits
natural visualization mechanisms for interpreting graph motifs explaining the
predictions, making it a potentially strong candidate for scientific
applications where interpretation is important. Code available at
https://github.com/inria-thoth/GraphiT
Convolutional Kernel Networks for Graph-Structured Data
International audienceWe introduce a family of multilayer graph kernels and establish new links between graph convolutional neural networks and kernel methods. Our approach generalizes convolutional kernel networks to graph-structured data, by representing graphs as a sequence of kernel feature maps, where each node carries information about local graph substructures. On the one hand, the kernel point of view offers an unsupervised, expressive, and easy-to-regularize data representation, which is useful when limited samples are available. On the other hand, our model can also be trained end-to-end on large-scale data, leading to new types of graph convolutional neural networks. We show that our method achieves competitive performance on several graph classification benchmarks, while offering simple model interpretation. Our code is freely available at https://github.com/claying/GCKN
Biological Sequence Modeling with Convolutional Kernel Networks
International audienc
Recurrent Kernel Networks
International audienceSubstring kernels are classical tools for representing biological sequences or text.However, when large amounts of annotated data are available, models that allowend-to-end training such as neural networks are often preferred. Links betweenrecurrent neural networks (RNNs) and substring kernels have recently been drawn,by formally showing that RNNs with specific activation functions were pointsin a reproducing kernel Hilbert space (RKHS). In this paper, we revisit this linkby generalizing convolutional kernel networks—originally related to a relaxationof the mismatch kernel—to model gaps in sequences. It results in a new type ofrecurrent neural network which can be trained end-to-end with backpropagation, orwithout supervision by using kernel approximation techniques. We experimentallyshow that our approach is well suited to biological sequences, where it outperformsexisting methods for protein classification tasks
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention
International audienceWe address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost. Our aggregation technique admits two useful interpretations: it may be seen as a mechanism related to attention layers in neural networks, or it may be seen as a scalable surrogate of a classical optimal transport-based kernel. We experimentally demonstrate the effectiveness of our approach on biological sequences, achieving state-of-the-art results for protein fold recognition and detection of chromatin profiles tasks, and, as a proof of concept, we show promising results for processing natural language sequences. We provide an open-source implementation of our embedding that can be used alone or as a module in larger learning models at https://github.com/claying/OTK